Logistic Regression with Variables Subject to Post Randomization Method
نویسندگان
چکیده
An increase in quality and detail of publicly available databases increases the risk of disclosure of sensitive personal information contained in such databases. The goal of Statistical Disclosure Control (SDC) is to develop methodology that aims at minimizing disclosure risk while providing society with as much information as possible needed for valid statistical inference. The Post Randomization Method (PRAM) is a disclosure avoidance method, where values of categorical variables are perturbed via some known probability mechanism, and only the perturbed data are released thus raising issues regarding disclosure risk and data utility. In this paper, we propose a number of EM algorithms to obtain unbiased estimates of the logistic regression model with data subject to PRAM, and thus effectively account for the effects of PRAM and preserve data utility. The effect of the level of perturbation and sample size on the estimates are evaluated, and relevant standard error estimates are proposed.
منابع مشابه
Comparison of Random Forest and Logistic Regression Methods in Predicting Mortality in Colorectal Cancer Patients and its Related Factors
Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors on the mortality of patients with colorectal cancer using random forest and logistic regression methods. Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shah...
متن کاملRandomization Does Not Justify Logistic Regression
The logit model is often used to analyze experimental data. However, randomization does not justify the model, so the usual estimators can be inconsistent. A consistent estimator is proposed. Neyman’s non-parametric setup is used as a benchmark. In this setup, each subject has two potential responses, one if treated and the other if untreated; only one of the two responses can be observed. Besi...
متن کاملMulti-period monitoring and prediction of forest cover loss using logistic regression model in Arasbaran catchments
Knowledge and understanding of changes in forest cover in relation to environmental factors (topography) can be valuable in terms of conservational and protective guidances. The purpose of this study was to identify, quantify and predict deforestation in relation to topographic variables using logistic regression model. The Arasbaran catchments (Naposhtehchay, Ilginehchay and Mardanqumchay) in ...
متن کاملComparison of artificial neural network with logistic regression in prediction of tendency to surgical intervention in nurses
Introduction: Logistic regression is one of the modeling methods for bipartite dependent variables. On the other hand, artificial neural network is a flexible method with the least limitation. The importance of growing unnecessary beauty surgeries and the importance of prediction and classification made us consider the present study, with the aim of comparing logistic regression and artificial ...
متن کاملA review of logistic regression models used to predict post-fire tree mortality of western North American conifers
Abstract. Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed burns and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate and interpret log...
متن کامل